Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Bioinform Comput Biol ; 21(3): 2350013, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37350314

RESUMO

Precision medicine has been a global trend of medical development, wherein cancer diagnosis plays an important role. With accurate diagnosis of cancer, we can provide patients with appropriate medical treatments for improving patients' survival. Since disease developments involve complex interplay among multiple factors such as gene-gene interactions, cancer classifications based on microarray gene expression profiling data are expected to be effective, and hence, have attracted extensive attention in computational biology and medicine. However, when using genomic data to build a diagnostic model, there exist several problems to be overcome, including the high-dimensional feature space and feature contamination. In this paper, we propose using the overlapping group screening (OGS) approach to build an accurate cancer diagnosis model and predict the probability of a patient falling into some disease classification category in the logistic regression framework. This new proposal integrates gene pathway information into the procedure for identifying genes and gene-gene interactions associated with the classification of cancer outcome groups. We conduct a series of simulation studies to compare the predictive accuracy of our proposed method for cancer diagnosis with some existing machine learning methods, and find the better performances of the former method. We apply the proposed method to the genomic data of The Cancer Genome Atlas related to lung adenocarcinoma (LUAD), liver hepatocellular carcinoma (LHC), and thyroid carcinoma (THCA), to establish accurate cancer diagnosis models.


Assuntos
Detecção Precoce de Câncer , Neoplasias , Humanos , Perfilação da Expressão Gênica/métodos , Genômica , Simulação por Computador , Neoplasias/genética
2.
Front Biosci (Landmark Ed) ; 27(8): 225, 2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-36042165

RESUMO

BACKGROUND: In biomedical and epidemiological studies, gene-environment (G-E) interactions play an important role in the etiology and progression of many complex diseases. In ultra-high-dimensional survival genomic data, two common approaches (marginal and joint models) are proposed to determine important interaction biomarkers. Most existing methods for detecting G-E interactions (marginal Cox model and marginal accelerated failure time model) are limited by a lack of robustness to contamination/outliers in response outcome and prediction biomarkers. In particular, right-censored survival outcomes and ultra-high-dimensional feature space make relevant feature screening even more challenging. METHODS: In this paper, we utilize the non-parametric Kendall's partial correlation method to obtain pure correlation to determine the importance of G-E interactions concerning clinical survival data under a marginal modeling framework. RESULTS: A series of simulated scenarios are conducted to compare the performance of our proposed method (Kendall's partial correlation) with some commonly used methods (marginal Cox's model, marginal accelerated failure time model, and censoring quantile partial correlation approach). In real data applications, we utilize Kendall's partial correlation method to identify G-E interactions related to the clinical survival results of patients with esophageal, pancreatic, and lung carcinomas using The Cancer Genome Atlas clinical survival genetic data, and further establish survival prediction models. CONCLUSIONS: Overall, both simulation with medium censoring level and real data studies show that our method performs well and outperforms existing methods in the selection, estimation, and prediction accuracy of main and interacting biomarkers. These applications reveal the advantages of the non-parametric Kendall's partial correlation approach over alternative semi-parametric marginal modeling methods. We also identified the cancer-related G-E interactions biomarkers and reported the corresponding coefficients with p-values.


Assuntos
Interação Gene-Ambiente , Neoplasias Pulmonares , Simulação por Computador , Genômica , Humanos , Neoplasias Pulmonares/genética
3.
BMC Bioinformatics ; 23(1): 202, 2022 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-35637439

RESUMO

BACKGROUND: In the context of biomedical and epidemiological research, gene-environment (G-E) interaction is of great significance to the etiology and progression of many complex diseases. In high-dimensional genetic data, two general models, marginal and joint models, are proposed to identify important interaction factors. Most existing approaches for identifying G-E interactions are limited owing to the lack of robustness to outliers/contamination in response and predictor data. In particular, right-censored survival outcomes make the associated feature screening even challenging. In this article, we utilize the overlapping group screening (OGS) approach to select important G-E interactions related to clinical survival outcomes by incorporating the gene pathway information under a joint modeling framework. RESULTS: Simulation studies under various scenarios are carried out to compare the performances of our proposed method with some commonly used methods. In the real data applications, we use our proposed method to identify G-E interactions related to the clinical survival outcomes of patients with head and neck squamous cell carcinoma, and esophageal carcinoma in The Cancer Genome Atlas clinical survival genetic data, and further establish corresponding survival prediction models. Both simulation and real data studies show that our method performs well and outperforms existing methods in the G-E interaction selection, effect estimation, and survival prediction accuracy. CONCLUSIONS: The OGS approach is useful for selecting important environmental factors, genes and G-E interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The same idea of the OGS approach can apply to other outcome models, such as the proportional odds survival time model, the logistic regression model for binary outcomes, and the multinomial logistic regression model for multi-class outcomes.


Assuntos
Interação Gene-Ambiente , Neoplasias , Simulação por Computador , Genômica , Humanos , Neoplasias/genética , Pesquisa
4.
Anal Chim Acta ; 1208: 339814, 2022 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-35525585

RESUMO

Metabolism studies are one of the important steps in pharmaceutical research. LC-MS combined with metabolomics data-processing approaches have been developed for rapid screening of drug metabolites. Mass defect filter (MDF) is one of the LC/MS-based metabolomics data processing approaches and has been applied to screen drug metabolites. Although MDF can remove most interference ions from an incubation sample, the true positive rate of the retaining ions is relatively low (approximately 10%). To improve the efficacy of MDF, we developed a two-stage data-processing approach by combining MDF and stable isotope tracing (SIT) for metabolite identification. Pioglitazone (PIO), which is an antidiabetic drug used to treat type 2 diabetes mellitus, was taken as an example drug. Our results demonstrated that this new approach could substantially increase the validated rate from about 10% to 74%. Most of these validated metabolite signals (13/14) could be verified as PIO structure-related metabolites. In addition, we applied this approach to identify uncommon metabolite signals (a mass change beyond the window of 50 Da around its parent drug, MDF1). SIT could remove most interference ions (approximately 98%) identified by MDF1, and four out of five validated metabolite signals could be verified as PIO structure-related metabolites. Interestingly, a lot of the verified metabolites (10/17) were novel PIO metabolites. Among these novel metabolites, nine were thiazolidinedione ring-opening signals that might be related to the toxicity of PIO. Our developed approach could significantly improve the efficacy in drug metabolite identification compared with that of MDF.


Assuntos
Diabetes Mellitus Tipo 2 , Cromatografia Líquida/métodos , Humanos , Isótopos , Espectrometria de Massas/métodos , Metabolômica/métodos
5.
PeerJ ; 10: e13098, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35291482

RESUMO

Background: In high-dimensional survival genomic data, identifying cancer-related genes is a challenging and important subject in the field of bioinformatics. In recent years, many feature screening approaches for survival outcomes with high-dimensional survival genomic data have been developed; however, few studies have systematically compared these methods. The primary purpose of this article is to conduct a series of simulation studies for systematic comparison; the second purpose of this article is to use these feature screening methods to further establish a more accurate prediction model for patient survival based on the survival genomic datasets of The Cancer Genome Atlas (TCGA). Results: Simulation studies prove that network-adjusted feature screening measurement performs well and outperforms existing popular univariate independent feature screening methods. In the application of real data, we show that the proposed network-adjusted feature screening approach leads to more accurate survival prediction than alternative methods that do not account for gene-gene dependency information. We also use TCGA clinical survival genetic data to identify biomarkers associated with clinical survival outcomes in patients with various cancers including esophageal, pancreatic, head and neck squamous cell, lung, and breast invasive carcinomas. Conclusions: These applications reveal advantages of the new proposed network-adjusted feature selection method over alternative methods that do not consider gene-gene dependency information. We also identify cancer-related genes that are almost detected in the literature. As a result, the network-based screening method is reliable and credible.


Assuntos
Neoplasias da Mama , Genômica , Humanos , Feminino , Genômica/métodos , Biologia Computacional/métodos , Neoplasias da Mama/diagnóstico , Simulação por Computador , Células Epiteliais
6.
Bioinformatics ; 37(15): 2150-2156, 2021 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-33595070

RESUMO

MOTIVATION: In high-dimensional genetic/genomic data, the identification of genes related to clinical survival trait is a challenging and important issue. In particular, right-censored survival outcomes and contaminated biomarker data make the relevant feature screening difficult. Several independence screening methods have been developed, but they fail to account for gene-gene dependency information, and may be sensitive to outlying feature data. RESULTS: We improve the inverse probability-of-censoring weighted (IPCW) Kendall's tau statistic by using Google's PageRank Markov matrix to incorporate feature dependency network information. Also, to tackle outlying feature data, the nonparanormal approach transforming the feature data to multivariate normal variates are utilized in the graphical lasso procedure to estimate the network structure in feature data. Simulation studies under various scenarios show that the proposed network-adjusted weighted Kendall's tau approach leads to more accurate feature selection and survival prediction than the methods without accounting for feature dependency network information and outlying feature data. The applications on the clinical survival outcome data of diffuse large B-cell lymphoma and of The Cancer Genome Atlas lung adenocarcinoma patients demonstrate clearly the advantages of the new proposal over the alternative methods. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Bioinformatics ; 36(9): 2763-2769, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31926011

RESUMO

MOTIVATION: In gene expression and genome-wide association studies, the identification of interaction effects is an important and challenging issue owing to its ultrahigh-dimensional nature. In particular, contaminated data and right-censored survival outcome make the associated feature screening even challenging. RESULTS: In this article, we propose an inverse probability-of-censoring weighted Kendall's tau statistic to measure association of a survival trait with biomarkers, as well as a Kendall's partial correlation statistic to measure the relationship of a survival trait with an interaction variable conditional on the main effects. The Kendall's partial correlation is then used to conduct interaction screening. Simulation studies under various scenarios are performed to compare the performance of our proposal with some commonly available methods. In the real data application, we utilize our proposed method to identify epistasis associated with the clinical survival outcomes of non-small-cell lung cancer, diffuse large B-cell lymphoma and lung adenocarcinoma patients. Both simulation and real data studies demonstrate that our method performs well and outperforms existing methods in identifying main and interaction biomarkers. AVAILABILITY AND IMPLEMENTATION: R-package 'IPCWK' is available to implement this method, together with a reference manual describing how to perform the 'IPCWK' package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Estudo de Associação Genômica Ampla , Humanos , Neoplasias Pulmonares/genética , Fenótipo
8.
Lifetime Data Anal ; 26(2): 292-314, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31065967

RESUMO

Assuming Cox's regression model, we consider penalized full likelihood approach to conduct variable selection under nested case-control (NCC) sampling. Penalized non-parametric maximum likelihood estimates (PNPMLEs) are characterized by self-consistency equations derived from score functions. A cross-validation method based on profile likelihood is used to choose the tuning parameter within a family of penalty functions. Simulation studies indicate that the numerical performance of (P)NPMLE is better than weighted partial likelihood in estimating the log-relative risk and in identifying the covariates and the model, under NCC sampling. LASSO performs best when cohort size is small; SCAD performs best when cohort size is large and may eventually perform as well as the oracle estimator. Using the SCAD penalty, we establish the consistency, asymptotic normality, and oracle properties of the PNPMLE, as well as the sparsity property of the penalty. We also propose a consistent estimate of the asymptotic variance using observed profile likelihood. Our method is illustrated to analyze the diagnosis of liver cancer among those in a type 2 diabetic mellitus dataset who were treated with thiazolidinediones in Taiwan.


Assuntos
Funções Verossimilhança , Modelos de Riscos Proporcionais , Estudos de Amostragem , Algoritmos , Feminino , Humanos , Masculino , Estatísticas não Paramétricas
9.
BMC Bioinformatics ; 19(1): 335, 2018 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-30241463

RESUMO

BACKGROUND: The development of a disease is a complex process that may result from joint effects of multiple genes. In this article, we propose the overlapping group screening (OGS) approach to determining active genes and gene-gene interactions incorporating prior pathway information. The OGS method is developed to overcome the challenges in genome-wide data analysis that the number of the genes and gene-gene interactions is far greater than the sample size, and the pathways generally overlap with one another. The OGS method is further proposed for patients' survival prediction based on gene expression data. RESULTS: Simulation studies demonstrate that the performance of the OGS approach in identifying the true main and interaction effects is good and the survival prediction accuracy of OGS with the Lasso penalty is better than the ordinary Lasso method. In real data analysis, we identify several significant genes and/or epistasis interactions that are associated with clinical survival outcomes of diffuse large B-cell lymphoma (DLBCL) and non-small-cell lung cancer (NSCLC) by utilizing prior pathway information from the KEGG pathway and the GO biological process databases, respectively. CONCLUSIONS: The OGS approach is useful for selecting important genes and epistasis interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The OGS approach is generally applicable to various types of outcome data (quantitative, qualitative, censored event time data) and regression models (e.g. linear, logistic, and Cox's regression models).


Assuntos
Carcinoma Pulmonar de Células não Pequenas/mortalidade , Epistasia Genética , Loci Gênicos , Neoplasias Pulmonares/mortalidade , Linfoma Difuso de Grandes Células B/mortalidade , Transcriptoma , Algoritmos , Carcinoma Pulmonar de Células não Pequenas/genética , Simulação por Computador , Bases de Dados Factuais , Perfilação da Expressão Gênica , Humanos , Neoplasias Pulmonares/genética , Linfoma Difuso de Grandes Células B/genética , Valor Preditivo dos Testes , Taxa de Sobrevida
10.
Cancer Epidemiol ; 53: 42-48, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29396159

RESUMO

INTRODUCTION: In high-income countries, advances in early diagnosis and treatment have improved cancer survival. However, socioeconomic inequalities in survival have persisted or increased for some adult cancers. MATERIALS AND METHODS: We assessed net survival for the 20 most common adult cancers in Taiwan. They were stratified into six age groups and three socioeconomic groups. RESULTS: Out of 120 cancer site and age group combinations, 49 showed improvements in 5-year net survival from 2000-2004 to 2005-2010. Only cervix uteri cancer in the 35-49-year age group showed a deterioration. During 2000-2010, 13 of the 20 cancer cases experienced socioeconomic inequalities for all age groups combined, and the deprivation gaps varied with cancer site and age at diagnosis. For the five most common cancers - liver, colon and rectum, lung, breast, and oral - there were socioeconomic inequalities, and 5-year net survival improved for most or all of the six age groups from 2000-2004 to 2005-2010. CONCLUSION: Reducing socioeconomic inequality in survival may lead to improvements in survival overall. We should focus on the age groups with large deprivation gaps. Our results are useful for prioritizing cancer sites and age groups for in-depth socioeconomic disparity studies and for proposing interventions for health disparity reductions and net cancer survival improvements.


Assuntos
Neoplasias/mortalidade , Fatores Socioeconômicos , Adulto , Idoso , Feminino , Humanos , Renda , Masculino , Pessoa de Meia-Idade , Taiwan/epidemiologia
11.
Bioinformatics ; 33(22): 3595-3602, 2017 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-28651334

RESUMO

MOTIVATION: Identification of single nucleotide polymorphism (SNP) interactions is an important and challenging topic in genome-wide association studies (GWAS). Many approaches have been applied to detecting whole-genome interactions. However, these approaches to interaction analysis tend to miss causal interaction effects when the individual marginal effects are uncorrelated to trait, while their interaction effects are highly associated with the trait. RESULTS: A grouped variable selection technique, called two-stage grouped sure independence screening (TS-GSIS), is developed to study interactions that may not have marginal effects. The proposed TS-GSIS is shown to be very helpful in identifying not only causal SNP effects that are uncorrelated to trait but also their corresponding SNP-SNP interaction effects. The benefit of TS-GSIS are gaining detection of interaction effects by taking the joint information among the SNPs and determining the size of candidate sets in the model. Simulation studies under various scenarios are performed to compare performance of TS-GSIS and current approaches. We also apply our approach to a real rheumatoid arthritis (RA) dataset. Both the simulation and real data studies show that the TS-GSIS performs very well in detecting SNP-SNP interactions. AVAILABILITY AND IMPLEMENTATION: R-package is delivered through CRAN and is available at: https://cran.r-project.org/web/packages/TSGSIS/index.html. CONTACT: hsiung@nhri.org.tw. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Artrite Reumatoide/genética , Simulação por Computador , Predisposição Genética para Doença , Humanos , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...